AITopics

Technology: Information Technology > Artificial Intelligence (0.43)

arXiv.org Artificial IntelligenceSep-25-2025

Solving Freshness in RAG: A Simple Recency Prior and the Limits of Heuristic Trend Detection

Grofsky, Matthew

We address temporal failures in RAG systems using two methods on cybersecurity data. A simple recency prior achieved an accuracy of 1.00 on freshness tasks. In contrast, a clustering heuristic for topic evolution failed (0.08 F1-score), showing trend detection requires methods beyond simple heuristics.

artificial intelligence, machine learning, natural language, (16 more...)

doi: 10.36227/techrxiv.175832475.57637876/v1

2509.19376

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJun-30-2025

TIM: A Large-Scale Dataset and large Timeline Intelligence Model for Open-domain Timeline Summarization

Hu, Chuanrui, Hu, Wei, Yu, Penghang, Zhang, Hua, Bao, Bing-Kun

Open-domain Timeline Summarization (TLS) is crucial for monitoring the evolution of news topics. To identify changes in news topics, existing methods typically employ general Large Language Models (LLMs) to summarize relevant timestamps from retrieved news. While general LLMs demonstrate capabilities in zero-shot news summarization and timestamp localization, they struggle with assessing topic relevance and understanding topic evolution. Consequently, the summarized information often includes irrelevant details or inaccurate timestamps. To address these issues, we propose the first large Timeline Intelligence Model (TIM) for open-domain TLS, which is capable of effectively summarizing open-domain timelines. Specifically, we begin by presenting a large-scale TLS dataset, comprising over 1,000 news topics and more than 3,000 annotated TLS instances. Furthermore, we propose a progressive optimization strategy, which gradually enhance summarization performance. It employs instruction tuning to enhance summarization and topic-irrelevant information filtering capabilities. Following this, it exploits a novel dual-alignment reward learning method that incorporates both semantic and temporal perspectives, thereby improving the understanding of topic evolution principles. Through this progressive optimization strategy, TIM demonstrates a robust ability to summarize open-domain timelines. Extensive experiments in open-domain demonstrate the effectiveness of our TIM.

large language model, machine learning, natural language, (17 more...)

2506.21616

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-27-2025, 09:31:37 GMT

Scalable inference of topic evolution via models for latent geometric structures

latent geometric structure, scalable inference, topic evolution, (1 more...)

Technology: Information Technology > Artificial Intelligence (0.50)

Neural Information Processing SystemsJan-22-2025, 17:34:07 GMT

Reviews: Scalable inference of topic evolution via models for latent geometric structures

This is a very well written paper, both in style and substance. There are a few stylistic peculiarities that could surely be ruled out by thorough proof-reading. The authors present a nice introduction into the idea of modelling sets of topics, i.e. sets of points on a simplex, as the geometric structure of a polytope. They go on to describe, how evolution of such a polytope can be modelled over time by embedding a unit hypersphere into the simplex and modelling polytope evolution as random trajectories over this sphere. They further present a non-parametric hierarchical model for capturing polytopes with a varying number of topics and also multiple polytopes arising from different corpora.

evolution, geometric structure, latent geometric structure, (5 more...)

Genre: Summary/Review (0.61)

Technology: Information Technology > Artificial Intelligence (0.41)

arXiv.org Artificial IntelligenceJan-7-2025

Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives

Li, Jinchao, Wang, Yuejiao, Li, Junan, Kang, Jiawen, Zheng, Bo, Wong, Simon, Mak, Brian, Fung, Helene, Woo, Jean, Mak, Man-Wai, Kwok, Timothy, Mok, Vincent, Gong, Xianmin, Wu, Xixin, Liu, Xunying, Wong, Patrick, Meng, Helen

Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Speech analysis offers a non-intrusive and scalable screening method, particularly through narrative tasks in neuropsychological assessment tools. Traditional narrative analysis often focuses on local indicators in microstructure, such as word usage and syntax. While these features provide insights into language production abilities, they often fail to capture global narrative patterns, or microstructures. Macrostructures include coherence, thematic organization, and logical progressions, reflecting essential cognitive skills potentially critical for recognizing NCDs. Addressing this gap, we propose to investigate specific cognitive and linguistic challenges by analyzing topical shifts, temporal dynamics, and the coherence of narratives over time, aiming to reveal cognitive deficits by identifying narrative impairments, and exploring their impact on communication and cognition. The investigation is based on the CU-MARVEL Rabbit Story corpus, which comprises recordings of a story-telling task from 758 older adults. We developed two approaches: the Dynamic Topic Models (DTM)-based temporal analysis to examine the evolution of topics over time, and the Text-Image Temporal Alignment Network (TITAN) to evaluate the coherence between spoken narratives and visual stimuli. DTM-based approach validated the effectiveness of dynamic topic consistency as a macrostructural metric (F1=0.61, AUC=0.78). The TITAN approach achieved the highest performance (F1=0.72, AUC=0.81), surpassing established microstructural and macrostructural feature sets. Cross-comparison and regression tasks further demonstrated the effectiveness of proposed dynamic macrostructural modeling approaches for NCD detection.

large language model, machine learning, natural language, (20 more...)

2501.03727

Country:

Asia > China > Hong Kong (0.05)
North America > United States (0.04)
Europe > Austria > Vienna (0.04)

Genre:

Research Report > New Finding (0.93)
Overview (0.93)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.75)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.69)
Health & Medicine > Therapeutic Area > Neurology > Dementia (0.47)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
(2 more...)

Boutaleb, Allaa, Picault, Jerome, Grosjean, Guillaume

BERTrend: Neural Topic Modeling for Emerging Trends Detection

arXiv.org Artificial IntelligenceNov-21-2024

Detecting and tracking emerging trends and weak signals in large, evolving text corpora is vital for applications such as monitoring scientific literature, managing brand reputation, surveilling critical infrastructure and more generally to any kind of text-based event detection. Existing solutions often fail to capture the nuanced context or dynamically track evolving patterns over time. BERTrend, a novel method, addresses these limitations using neural topic modeling in an online setting. It introduces a new metric to quantify topic popularity over time by considering both the number of documents and update frequency. This metric classifies topics as noise, weak, or strong signals, flagging emerging, rapidly growing topics for further investigation. Experimentation on two large real-world datasets demonstrates BERTrend's ability to accurately detect and track meaningful weak signals while filtering out noise, offering a comprehensive solution for monitoring emerging trends in large-scale, evolving text corpora. The method can also be used for retrospective analysis of past events. In addition, the use of Large Language Models together with BERTrend offers efficient means for the interpretability of trends of events.

evolution, threshold, weak signal, (16 more...)

2411.0593

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado (0.04)
Europe > Spain (0.04)
(12 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Vaccines (0.68)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-9-2024, 18:57:42 GMT

Scalable inference of topic evolution via models for latent geometric structures

latent geometric structure, scalable inference, topic evolution, (1 more...)

Technology: Information Technology > Artificial Intelligence (0.50)

arXiv.org Artificial IntelligenceMay-28-2024

Modeling Dynamic Topics in Chain-Free Fashion by Evolution-Tracking Contrastive Learning and Unassociated Word Exclusion

Wu, Xiaobao, Dong, Xinshuai, Pan, Liangming, Nguyen, Thong, Luu, Anh Tuan

Dynamic topic models track the evolution of topics in sequential documents, which have derived various applications like trend analysis and opinion mining. However, existing models suffer from repetitive topic and unassociated topic issues, failing to reveal the evolution and hindering further applications. To address these issues, we break the tradition of simply chaining topics in existing work and propose a novel neural \modelfullname. We introduce a new evolution-tracking contrastive learning method that builds the similarity relations among dynamic topics. This not only tracks topic evolution but also maintains topic diversity, mitigating the repetitive topic issue. To avoid unassociated topics, we further present an unassociated word exclusion method that consistently excludes unassociated words from discovered topics. Extensive experiments demonstrate our model significantly outperforms state-of-the-art baselines, tracking topic evolution with high-quality topics, showing better performance on downstream tasks, and remaining robust to the hyperparameter for evolution intensities. Our code is available at https://github.com/bobxwu/CFDTM .

evolution, topic issue, topic model, (15 more...)

2405.17957

Country:

Asia > Singapore (0.04)
Europe > United Kingdom (0.04)
Europe > Russia (0.04)
(12 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Government (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)
(2 more...)

arXiv.org Artificial IntelligenceNov-15-2023

VIBE: Topic-Driven Temporal Adaptation for Twitter Classification

Zhang, Yuji, Li, Jing, Li, Wenjie

Language features are evolving in real-world social media, resulting in the deteriorating performance of text classification in dynamics. To address this challenge, we study temporal adaptation, where models trained on past data are tested in the future. Most prior work focused on continued pretraining or knowledge updating, which may compromise their performance on noisy social media data. To tackle this issue, we reflect feature change via modeling latent topic evolution and propose a novel model, VIBE: Variational Information Bottleneck for Evolutions. Concretely, we first employ two Information Bottleneck (IB) regularizers to distinguish past and future topics. Then, the distinguished topics work as adaptive features via multi-task training with timestamp and class label prediction. In adaptive learning, VIBE utilizes retrieved unlabeled data from online streams created posterior to training data time. Substantial Twitter experiments on three classification tasks show that our model, with only 3% of data, significantly outperforms previous state-of-the-art continued-pretraining methods.

adaptive data, computational linguistic, tweet, (15 more...)

2310.10191

Country:

North America > Dominican Republic (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.82)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)